Data in Brief
○ Elsevier BV
Preprints posted in the last 7 days, ranked by how well they match Data in Brief's content profile, based on 13 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Bauman, A.; Owen, K.; Messing, S.; Macdonald, H.; Nettlefold, L.; Richards, J.; Vandelanotte, C.; Chen, I.-H.; Cullen, B.; van Buskirk, J.; van Itallie, A.; Coletta, G.; O'Halloran, P.; Randle, E.; Nicholson, M.; Staley, K.; McKay, H. A.
Show abstract
Military aviation training noise remains understudied despite its widespread impacts across urban, rural, and wilderness areas. The predominance of low-frequency noise and repetitive training can create pervasive noise pollution, yet past research often fails to capture the full range of health and quality-of-life effects. This study analyzed two complaint datasets related to Whidbey Island Naval Air Station noise: U.S. Navy records (2017-2020) and Quiet Skies Over San Juan County data (2021-2023). We analyzed and mapped sentiment intensity from noise complaints relative to modeled annual noise exposure, developed a typology to classify impacts, and modeled the environmental and operational factors influencing complaints. Findings revealed widespread negative sentiment and anger, often beyond the bounds of estimated noise contours, suggesting that annual cumulative noise models inadequately estimate community impacts. Complaints consistently highlighted sleep disturbance, hearing and health concerns, and compromised home environments due to shaking, vibration, and disruption of daily life. Residents also reported significant social, recreational, and work disruptions, along with feelings of fear, helplessness, and concern for children's well-being. The number of complaints were strongly associated with training schedules, with late-night sessions being the strongest predictor. A delayed response pattern suggests residents reach a frustration threshold before filing complaints. Overall, our findings demonstrate persistent negative sentiment and diverse impacts from military aviation noise. Results highlight the need for improved noise metrics, modeling and operational adjustments to mitigate the most disruptive effects.
Mohsini, K.; Gore-Langton, G. R.; Rathod, S. D.; Mansfield, K. E.; Warren-Gash, C.
Show abstract
Aims Indoor air pollution resulting from combustion of unclean cooking fuels has been linked to adverse health outcomes, but evidence regarding its association with mental health in low- and middle-income countries remains limited. We investigated the association between household use of unclean cooking fuels, as a proxy for indoor air pollution, and depression symptoms among adults aged 45 years and older in India, and assessed effect modification by age, sex, caste, and rural/urban residence. Methods We conducted a cross-sectional analysis of the first wave (2017-2018) of data from the Longitudinal Aging Study in India (LASI), a nationally representative survey of adults aged [≥]45 years. Cooking fuel type was classified as clean or unclean, and depression symptoms were assessed using the 10-item Centre for Epidemiologic Studies Depression (CES-D-10) scale. We used logistic regression to estimate odds ratios for depression symptoms, and linear regression to compare mean CES-D-10 scores by cooking fuel type, adjusting for sociodemographic and housing characteristics. Results We included 62,650 respondents. Median age was 57 years (IQR: 50-65), 46.7% were women, 47.6% reported using unclean cooking fuels, and 27.6% screened positive on the CES-D-10. After adjusting for sociodemographic and housing characteristics, use of unclean cooking fuels was associated with higher odds of screening positive on the CES-D-10 (aOR: 1.08; 95% CI: 1.02, 1.15), and higher mean CES-D-10 scores (adjusted mean difference: 0.34; 95% CI: 0.24, 0.44). The association was more pronounced among individuals living in urban areas (aOR: 1.36; 95% CI: 1.21, 1.53). Conclusion Use of unclean cooking fuels was associated with depression symptoms among older adults in India, and especially among those living in urban areas.
Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.
Show abstract
Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.
Ng, J. Y.; Tan, J.; Syed, N.; Adapa, K.; Gupta, P. K.; Li, S.; Mehta, D.; Ring, M.; Shridhar, M.; Souza, J. P.; Yoshino, T.; Lee, M. S.; Cramer, H.
Show abstract
Background: Generative artificial intelligence (GenAI) chatbots have shown utility in assisting with various research tasks. Traditional, complementary, and integrative medicine (TCIM) is a patient-centric approach that emphasizes holistic well-being. The integration of TCIM and GenAI presents numerous key opportunities. However, TCIM researchers' attitudes toward GenAI tools remain less understood. This large-scale, international cross-sectional survey aimed to elucidate the attitudes and perceptions of TCIM researchers regarding the use of GenAI chatbots in the scientific process. Methods: A search strategy in Ovid MEDLINE identified corresponding authors who were TCIM researchers. Eligible authors were invited to complete an anonymous online survey administered via SurveyMonkey. The survey included questions on socio-demographic characteristics, familiarity with GenAI chatbots, and perceived benefits and challenges of using GenAI chatbots. Results were analysed using descriptive statistics and thematic content analysis. Results: The survey received 716 responses. Most respondents reported familiarity with GenAI chatbots (58.08%) and viewed them as very important to the future of scientific research (54.37%). The most acknowledged benefits included workload reduction (74.07%) and increased efficiency in data analysis/experimentation (71.14%). The most frequently reported challenges involved bias, errors, and limitations. More than half of the respondents (57.02%) expressed a need for training to use GenAI chatbots in the scientific process, alongside an interest in receiving training (72.07%). However, 43.67% indicated that their institutions did not offer these programs. Discussion: By developing a deeper understanding of TCIM researchers' perspectives, future AI applications in this field can be more informed, and guide future policies and collaboration among researchers.
Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.
Show abstract
Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.
Sivakumar, E.; Anand, A.
Show abstract
Computer vision and deep learning techniques, including convolutional neural networks (CNNs) and transformers, have increased the performance of medical image classification systems. However, training deep learning models using medical images is a challenging task that necessitates a substantial amount of annotated data. In this paper, we implement data augmentation strategies to tackle dataset imbalance in the VinDr-SpineXR dataset, which has a lower number of spine abnormality X-ray images compared to normal spine X-ray images. Geometric transformations and synthetic image generation using Generative Adversarial Networks are explored and applied to the abnormal classes of the dataset, and classifier performance is validated using VGG-16 and InceptionNet to identify the most effective augmentation technique. Additionally, we introduce a hybrid augmentation technique that addresses class imbalance, reduces computational overhead relative to a GAN-only approach, and achieves ~99% validation accuracy with both classifiers across all three case studies. Keywords: Data augmentation, Generative Adversarial Network, VGG-16, InceptionNet, Class imbalance, Computer vision, Spine X-ray, Radiology.
Walton, A. E.; Versalovic, E.; Merner, A. R.; Lazaro-Munoz, G.; Bush, A.; Richardson, M.
Show abstract
Patients who participate in intracranial neuroscience research make invaluable contributions to our understanding of the brain, accelerating the development of neurotechnological interventions. Engagement of patients as part of this research presents unique challenges, where study goals can be distant from immediate clinical applications and require specialized domain knowledge. Yet methods for meaningfully integrating patient communities as part of these research efforts is essential, as intracranial neuroscience guides the application of artificial intelligence for understanding and enhancing human cognition. In order to identify what patients consider meaningful research engagement we interviewed individuals who participated in a study during their Deep Brain Stimulation (DBS) surgery and attended a group event where they interacted with our research team. Analysis of semi-structured interviews identified four main themes: interest in science and the future of clinical care, contributing to science to improve lives, connecting with others, and accessibility considerations. Based on these insights, we propose strategies for transformational participation of patient communities in intracranial neuroscience research with respect to engagement objectives, communication and scope. This approach offers a foundation for sustaining relationships between scientists and communities rooted in trust and transparency, to ensure that impacts of neurotechnology on human health and cognition are aligned with patient needs as well as desired public values.
Yu, J.; Tillema, S.; Akel, M.; Aron, A.; Espinosa, E.; Fisher, S. A.; Branche, T. N.; Mithal, L. B.; Hartmann, E. M.
Show abstract
Benzalkonium chloride (BAC) is widely used as a disinfectant in cleaning products and is frequently detected in indoor dust. In this study, we assessed dust samples, along with information on cleaning product use, from 24 pregnant participants. Dust samples were analyzed for BAC concentration and microbial tolerance. Different chain lengths of BAC (C12, C14, and C16) were quantified using LC-MS/MS, and bacterial isolates were tested for BAC tolerance using minimum inhibitory concentration (MIC) assays. BAC was ubiquitously detected, with C12 and C14 being dominant. Higher BAC concentrations were associated with reported disinfectant use and increased microbial tolerance. These findings suggest that indoor antimicrobial use may promote microbial resistance, highlighting potential exposure risks in indoor environments and the need for further investigation into health and ecological impacts.
Wang, S.; Ayubcha, C.; Hua, Y.; Beam, A.
Show abstract
Background: Developing generalizable neuroimaging models is often hindered by limited labeled data which has led to an increased interest in unsupervised inverse learning. Existing approaches often neglect geometric principles and struggle with diverse pathologies. We propose a symmetry-informed inverse learning foundation model to address these shortcomings for robust and efficient anomaly detection in brain MRI. Methods: Our framework employs a reconstruction-to-embedding pipeline, trained exclusively on healthy brain MRI slices. A 2D U-Net uses a novel, symmetry-aware masking strategy to reconstruct a disorder-free slice. Difference maps are embedded into a 1024-dimensional latent space via a Beta-VAE. Anomaly scoring is performed using Mahalanobis distance. We evaluated generalization by fine-tuning on external lesion datasets, BraTS Africa (SSA), and the ADNI-derived Alzheimer disease cohort (Alz). Results: On the source metastasis (Mets) dataset, the framework achieved high performance (AB1+MSE: 99.28% accuracy, 99.79% sensitivity). Generalization to the external lesion dataset (SSA) was robust, with the Symmetry ROC configuration achieving 91.93% accuracy. Transfer to the Alzheimer dataset (Alz) was more challenging, achieving a peak accuracy of 70.54% with a high false-positive rate, suggesting difficulty in separating subtle, diffuse changes. Conclusion: The symmetry-informed inverse learning framework establishes a robust foundation model for neuroimaging, showing strong performance for focal lesions and successful generalization under domain shift. Limitations in diffuse neurodegeneration underscore the necessity for richer representations and multimodal integration to improve future foundation models.
Johansson, J.; Palonen, S.; Egorova, K.; Tuisku, J.; Harju, H.; Kärpijoki, H.; Maaniitty, T.; Saraste, A.; Saari, T.; Tuomola, N.; Rinne, J.; Nuutila, P.; Latva-Rasku, A.; Virtanen, K. A.; Knuuti, J.; Nummenmaa, L.
Show abstract
BackgroundQuantitative cerebral blood flow (CBF) measured with [15O]water positron emission tomography (PET) is the reference standard for quantifying brain perfusion. However, clinical interpretation of individual CBF measurements is limited by the absence of large normative datasets accounting for physiological variability across the adult lifespan. Long-axial field-of-view PET enables high-sensitivity quantitative [15O]water perfusion imaging without arterial blood sampling, allowing normative characterization of cerebral perfusion at unprecedented scale. The aim of this study was to establish normative and covariate-adjusted models of cerebral blood flow across the adult lifespan using total-body [15O]water PET. MethodsQuantitative CBF measurements were obtained in 302 neurologically healthy adults (age 21-86 years) using total-body [15O]water PET. Linear mixed-effects models were used to evaluate the effects of age, sex, body mass index (BMI), and blood hemoglobin concentration on CBF and to generate normative prediction models across the adult lifespan. Between-subject and within-subject variability were estimated from repeated scans in a subset of participants (n=51). ResultsMean grey matter CBF was 46.1 mL/(min*dL), with substantial inter-individual variability but high within-subject reproducibility (intraclass correlation coefficients 0.78-0.89). Advancing age was associated with a decline in CBF of approximately 7% per decade (p_FDR < 10-12). Higher BMI was associated with lower CBF (approximately -6% per 10 kg/m2; p_FDR < 0.01). Women exhibited higher CBF than men (approximately 7.5%), but this difference was largely explained by lower blood hemoglobin concentration in women. Covariate-adjusted models were used to generate normative predictions and prediction intervals describing expected CBF across adulthood. ConclusionThis study establishes a normative database of quantitative cerebral blood flow across the adult lifespan using high-sensitivity [15O]water PET. Age, BMI, and hemoglobin are major determinants of inter-individual variability in CBF. The resulting generative models provide a quantitative reference framework for interpreting cerebral perfusion measurements and may enable automated detection of abnormal brain perfusion in clinical PET imaging.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Adult diffuse glioma is a representative class of primary brain tumors for which accurate MRI-based tumor segmentation is indispensable for treatment planning. Conventional automated segmentation methods have relied primarily on image information and spatial prompts, and auxiliary clinical information that is routinely acquired in clinical practice has not been sufficiently exploited as an input. Objective. Building on a dual-prompt-driven Segment Anything Model (SAM) extension framework that fuses visual and language reference prompts, we propose a method that integrates patient demographics, unsupervised molecular cluster variables derived from TCGA high-throughput profiling, and histopathological parameters as learnable prompt embeddings, and we evaluate its effect on the accuracy of lower-grade glioma (LGG) MRI segmentation. Methods. An auxiliary prompt encoder converts clinical metadata into high-dimensional embeddings that are fused with the prompt representations of Segment Anything Model (SAM) ViT-B through a cross-attention fusion mechanism. The TCGA-LGG MRI Segmentation dataset (Kaggle release by Buda et al.; n = 110 patients; WHO grade II-III) was split at the patient level (train/val/test = 71/17/22) using three different random seeds, and the three slices with the largest tumor area were extracted from each patient. To avoid pseudo-replication arising from multiple slices per patient and repeated measurements across seeds, our primary analysis aggregated Dice and 95th-percentile Hausdorff distance (HD95) to the patient x seed unit (n = 66); secondary analyses at the unique-patient level (n = 22) and at the per-slice level (n = 198) are also reported. Pairwise comparisons used paired t-tests with Bonferroni correction (k = 3) and Wilcoxon signed-rank tests, and a permutation test (K = 30) served as an auxiliary check of effective use of the auxiliary information. Results. At the patient x seed level (n = 66), Proposed (full clinical) achieved a Dice gain of +0.287 over the zero-shot SAM ViT-B baseline (paired-t p = 4.2 x 10^-15, Cohen's d_z = +1.25, Bonferroni-corrected p << 0.001; Wilcoxon p = 2.0 x 10^-10), and HD95 improved from 218.2 to 64.6. Because zero-shot SAM is not designed for domain-specific medical segmentation, the large absolute HD95 gap largely reflects the expected domain gap rather than a competitive baseline. The additional contribution of the full clinical configuration over the demographics-only configuration was Dice = +0.023 (paired-t p = 0.057, Bonferroni-corrected p = 0.172), which did not reach statistical significance at the patient level and is reported as a directional trend. The permutation test (K = 30, seed 2025) yielded real-metadata Dice = 0.819 versus a shuffled-metadata mean of 0.773, giving an empirical p = 0.032 = 1/(K + 1), which is at the resolution limit of this test and should therefore be interpreted as preliminary evidence. Conclusions. Integrating auxiliary clinical information as multimodal prompts produced a large improvement over the zero-shot SAM baseline on this LGG cohort. More importantly, a robustness analysis showed that Proposed (full clinical) outperformed the trained Base (no auxiliary information) under all tested spatial-prompt conditions, including perfect centroid (+0.014), and that the advantage was most pronounced in the prompt-free regime (+0.231, p = 0.039), where the base model collapsed but the proposed model maintained meaningful segmentation by leveraging clinical metadata alone. The additional contribution of molecular and histopathological information beyond demographics was not statistically resolved at the patient level (+0.023, n.s.). Establishing clinical utility will require external validation on larger multi-center cohorts and direct comparisons with established segmentation methods. Keywords: brain tumor segmentation; Segment Anything Model (SAM); vision-language prompt-driven segmentation; auxiliary clinical prompts; multimodal learning; TCGA-LGG; deep learning
Gangolli, M.; Perkins, N. J.; Marinelli, L.; Basser, P. J.; Avram, A. V.
Show abstract
BACKGROUNDMild traumatic brain injury (mTBI) is a signature injury in civilian and military populations that remains invisible to detection by conventional radiological methods. Diffusion MRI has been identified as a potential clinical tool for revealing subtle microstructural alterations associated with mTBI. OBJECTIVEThis study evaluates whether a comprehensive and powerful diffusion MRI (dMRI) technique called mean apparent propagator (MAP) MRI can detect sequelae of mTBI. METHODSWe analyzed data from 417 participants of the GE/NFL prospective mTBI study which included 143 matched controls (mean age, 21.9 {+/-} 8.3 years; 76 women) and 274 patients with acute mTBI and GCS [≥]13 (mean age, 21.9 {+/-} 8.5 years; 131 women). All participants underwent MRI exams at up to four visits including structural high-resolution T1W, T2W, FLAIR-T2W, and dMRI, in addition to clinical assessments of post-concussive physical symptoms (RPQ-3), psychosocial functioning and lifestyle symptoms (RPQ-13), and postural stability (BESS). The dMRI data for each subject were co-registered across all visits and analyzed using the MAP-MRI framework to measure and map the distribution of net microscopic displacements of diffusing water molecules in tissue and ultimately compute the microstructural MAP-MRI tissue parameters including propagator anisotropy (PA), Non-Gaussianity (NG), return-to-origin probability (RTOP), return-to-axis probability (RTAP), and return-to-plane probability (RTPP). We quantified voxel-wise and region-of-interest (ROI)-based changes in these parameters across all four visits. RESULTSMAP-MRI parameter values were within the expected ranges and showed relatively little variation across visits. We found no significant differences in the longitudinal trajectories of these parameters between mTBI patients and controls. At acute post-injury timepoints, RPQ-3 and RPQ-13 scores were increased in mTBI patients relative to controls, while BESS scores were not significantly different between groups. Analysis of dMRI metrics and clinical mTBI markers showed significant correspondence between MAP-MRI metrics in cortical gray matter, caudate and pallidum and BESS scores. CONCLUSIONWe developed and tested a state-of-the-art quantitative image processing pipeline for sensitive analysis and detection of subtle tissue changes in longitudinal clinical diffusion MRI data. The absence of a significant statistical difference between populations in the dMRI parameters in this study suggests that the mTBI corresponded to acute post-injury clinical symptoms but that the injury was not severe enough to cause detectable microstructural damage/alterations, and that increased diffusion sensitization combined with improved analysis techniques may be needed. CLINICAL IMPACTThese findings suggest that acute mTBI (GCS[≥]13) may not be detectable with diffusion MRI. TRIAL REGISTRATIONClinicalTrials.gov NCT02556177
Chihara, A.; Mizuno, R.; Kagawa, N.; Takayama, A.; Okumura, A.; Suzuki, M.; Shibata, Y.; Mochii, M.; Ohuchi, H.; Sato, K.; Suzuki, K.-i. T.
Show abstract
Fluorescent in situ hybridization (FISH) enables highly sensitive, high-resolution detection of gene transcripts. Moreover, by employing multiple probes, this technique allows for multiplexed, simultaneous detection of distinct gene expression patterns spatiotemporally, making it a valuable spatial transcriptomics approach. Owing to these advantages, FISH techniques are rapidly being adopted across diverse areas of basic biology. However, conventional protocols often rely on volatile, toxic reagents such as formalin or methanol, posing potential health risks to researchers. Here, we present a safer protocol that replaces these chemicals with low-toxicity alternatives, without compromising the high detection sensitivity of FISH. We validated this protocol using both in situ hybridization chain reaction (HCR) and signal amplification by exchange reaction (SABER)-FISH in frozen sections of various model organisms, including mouse (Mus musculus), amphibians (Xenopus laevis and Pleurodeles waltl), and medaka (Oryzias latipes). Our results demonstrate successful multiplexed detection of morphogenetic and cell-type marker genes in these model animals using this safer protocol. The protocol has the additional advantage of requiring no proteolytic enzyme treatment, thus preserving tissue integrity. Furthermore, we show that this protocol is fully compatible with EGFP immunostaining, allowing for the simultaneous detection of mRNAs and reporter proteins in transgenic animals. This protocol retains the benefits of highly sensitive, multiplexed, and multimodal detection afforded by integrating in situ HCR and SABER-FISH with immunohistochemistry, while providing a safer option for researchers, thereby offering a valuable tool for basic biology.
Undurraga Lucero, J. A.; Chesnaye, M.; Simpson, D.; Laugesen, S.
Show abstract
Objective detection of evoked potentials (EPs) is central to digital diagnostics in hearing assessment and clinical neurophysiology, yet current approaches remain time-intensive and sensitive to inter-individual noise variability. Many existing detection methods rely on population-based assumptions or computationally demanding procedures, limiting robustness and efficiency in real-world clinical settings. We present Fmpi, a digital EP detection framework enabling individualised, real-time response detection through analytical modelling of the spectral colour and temporal dynamics of background noise within each recording. Using extensive simulations and large-scale human electroencephalography datasets spanning brainstem, steady-state, and cortical EPs recorded in adults and infants, we demonstrate performance comparable or superior to state-of-the-art bootstrapped methods while operating at a fraction of the computational cost and maintaining well-controlled sensitivity with improved specificity. Importantly, Fmpi incorporates a futility detection mechanism enabling early termination of uninformative recordings, reducing testing time without compromising diagnostic reliability.
Gartlehner, G.; Banda, S.; Callaghan, M.; Chase, J.-A.; Dobrescu, A.; Eisele-Metzger, A.; Flemyng, E.; Gardner, S.; Griebler, U.; Helfer, B.; Jemiolo, P.; Macura, B.; Minx, J. C.; Noel-Storr, A.; Rajabzadeh Tahmasebi, N.; Sharifan, A.; Meerpohl, J.; Thomas, J.
Show abstract
Background: Artificial intelligence (AI) has the potential to improve the efficiency of evidence synthesis and reduce human error. However, robust methods for evaluating rapidly evolving AI tools within the practical workflows of evidence synthesis remain underdeveloped. This protocol describes a study design for assessing the effectiveness, efficiency, and usability of AI tools in comparison to traditional human-only workflows in the context of Cochrane systematic reviews. Methods: Members of the Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods Project developed an adaptive platform study-within-a-review (SWAR) design, modeled after clinical platform trials. This design employs a master protocol to concurrently evaluate multiple AI tools (interventions) against a standard human-only process (control) across three key review tasks: title and abstract screening, full-text screening, and data extraction. The adaptive framework allows for the addition or removal of AI tools based on interim performance analyses without necessitating a restart of the study. Performance will be assessed using metrics such as accuracy (sensitivity, specificity, precision), efficiency (time on task), response stability, impact of errors, and usability, in alignment with Responsible use of AI in evidence SynthEsis (RAISE) principles. Results: The study will generate comparative data about the performance and usability of specific AI tools employed in a semi- or fully automated manner relative to standard human effort. The protocol provides a flexible framework for the assessment of AI tools in evidence synthesis, addressing the limitations of static, one-time evaluations. Discussion: This study protocol presents a novel methodological approach to addressing the challenges of evaluating AI tools for evidence syntheses. By validating entire workflows rather than individual technologies, the findings will establish an evidence base for determining the viability of integrating AI into evidence-synthesis workflows. The adaptive design of this study is flexible and can be adopted by other investigators, ensuring that the evaluation framework remains relevant as new tools emerge.
Yang, Z.; Lyng, G. D.; Batra, S. S.; Tillman, R. E.
Show abstract
Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnoses, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates medical concept extraction as a verification task over medical note concept pairs, coupled with sentence level evidence identification. Built from MIMIC-IV discharge summaries and human verified ICD-10 codes, the dataset is curated through a multi stage large language model (LLM) triage pipeline followed by medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with medical expert assessments. Annotators provide sentence level evidence spans and concise medical rationales. The final dataset contains 823 high quality examples. We define two complementary evaluation tasks: (1) medical concept extraction and (2) sentence level evidence retrieval, enabling assessment of both correctness and interpretability. Benchmarking state-of-the-art LLMs and a supervised baseline reveals that performance remains modest, highlighting the difficulty of extracting implicitly expressed concepts. We further show that explicitly incorporating reasoning cues and prompting to extract implicit evidence substantially improves medical concept extractions, while performance is largely invariant to note length, indicating that MedicalBench isolates reasoning difficulty rather than superficial confounders. MedicalBench provides the first systematic benchmark for implicit, evidence-grounded medical concept extraction, offering a foundation for developing medical language models that can both identify medically relevant concepts and justify their predictions in a transparent and medically faithful manner.
Purkayastha, D. S.
Show abstract
Inadequate discharge communication is a well-documented contributor to medication non-adherence, missed follow-ups, and preventable readmissions across healthcare systems worldwide. In resource-limited oncology settings, where patients are often low-literate, speak non-dominant languages, and manage complex multi-drug regimens, this problem is acute and largely unaddressed. We present Aakhyan, a vernacular patient communication platform that addresses the full post-discharge arc: from converting English-language discharge summaries into structured, voice-based vernacular explanations, through medication adherence support, to proactive follow-up management - all delivered via WhatsApp. The architecture is novel in its strict separation of concerns: a vision-language model performs structured JSON extraction from discharge images; all patient-facing content is generated deterministically from clinician-approved templates with community-sensitive vocabulary registers. This design eliminates the hallucination risk inherent in generative AI patient communication (documented at 18-82% in prior studies) while preserving the extraction capability of large language models. The platform supports four language registers, Bengali, Hindi, simplified English for tribal populations, and Assamese, with text-to-speech synthesis across all registers, including a custom grapheme-to-phoneme engine developed for Assamese phonology. Beyond discharge communication, the platform includes scheduled medication adherence nudges, interactive follow-up reminders, and a Daily Availability and Patient Notification System (DAPNS) that notifies patients the evening before their follow-up whether their doctor and required investigations are available, preventing wasted trips by rural patients who travel 2-6 hours to reach the centre. A 100-patient stratified randomised controlled study is planned at Silchar Cancer Centre, with structured teach-back assessment at 48-72 hours post-discharge as the primary comprehension outcome and preliminary clinical efficacy as a secondary objective. This paper describes the clinical rationale, technical architecture, safety framework, and positioning of Aakhyan within the existing literature on mHealth patient communication interventions.
Malingumu, E. E.; Badaga, I.; Kisendi, D. D.; Pierre Kabore, R. W.; Yeremon, O. G.; Mohamed, M. A.; He, Q.
Show abstract
This study evaluates the feasibility of implementing artificial intelligence (AI)-driven disease surveillance systems at Julius Nyerere International Airport (JNIA) in Tanzania, a key hub for regional and international travel. Through a mixed-methods approach combining qualitative interviews and quantitative surveys, the research assesses the infrastructure, human resource capacity, and regulatory frameworks necessary for AI integration. Findings indicate that while Port Health Officers are strongly optimistic about AIs potential to enhance disease detection, the airport faces significant barriers, including outdated infrastructure, insufficient technical resources, and a lack of trained personnel. Ethical and privacy concerns, particularly surrounding data security, also emerged as key challenges, compounded by limited public awareness and the socio-cultural acceptability of AI systems. Furthermore, the study identifies gaps in national policies and inter-agency coordination that hinder the effective implementation of AI technologies. The research concludes that while current conditions render AI adoption infeasible, strategic investments in infrastructure, workforce training, and policy development could pave the way for future integration, enhancing public health surveillance at JNIA and potentially other airports in low- and middle-income countries. This study contributes critical insights into the barriers and opportunities for AI-driven disease surveillance in low-resource settings, specifically focusing on a high-priority transit point, international airports. It emphasizes the importance of region-specific solutions to enhance health security in East Africa and supports the broader global health agenda by advocating for international collaboration and the development of scalable disease surveillance systems. Future research should explore pilot AI implementations at other airports to evaluate real-world challenges and refine AI systems for broader applicability, including cost-effectiveness analyses and integration of public perspectives on AI.
Tan, J.; Tang, P. H.
Show abstract
Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.
Pietilainen, O.; Salonsalmi, A.; Rahkonen, O.; Lahelma, E.; Lallukka, T.
Show abstract
Objectives: Longer lifespans lead to longer time on retirement, despite the efforts to raise the retirement age. Therefore, it is important to study how the retirement years can be spent without diseases. This study examined socioeconomic and sociodemographic differences in healthy years spent on retirement. Methods: We followed a cohort of retired Finnish municipal employees (N=4231, average follow-up 15.4 years) on national administrative registers for major chronic diseases: cancer, coronary heart disease, cerebrovascular disease, diabetes, asthma or chronic obstructive pulmonary disease, dementia, mental disorders, and alcohol-related disorders. Median healthy years on retirement and age at first occurrence of illness (ICD-10 and ATC-based) in each combination of sex, occupational class, and age of retirement were predicted using Royston-Parmar models. Prevalence rates for each diagnostic group were calculated. Results: Most healthy years on retirement were spent by women having worked in semi-professional jobs who retired at age 60-62 (median predicted healthy years 11.6, 95% CI 10.4-12.7). The least healthy years on retirement were spent by men having worked in routine non-manual jobs who retired after age 62 (median predicted healthy years 6.5, 95% CI 4.4-9.5). Diabetes was slightly more common among lower occupational class women, and dementia among manual working women having retired at age 60-62. Discussion: Healthy years on retirement are not enjoyed equally by women and men and those who retire early or later. Policies aiming to increase the retirement age should consider the effects of these gaps on retirees and the equitability of those effects.